Dataset statistics
| Number of variables | 25 |
|---|---|
| Number of observations | 665249 |
| Missing cells | 279371 |
| Missing cells (%) | 1.7% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 126.9 MiB |
| Average record size in memory | 200.0 B |
Variable types
| CAT | 11 |
|---|---|
| NUM | 9 |
| BOOL | 5 |
time has a high cardinality: 1204 distinct values | High cardinality |
age_youngest is highly correlated with age_oldest | High correlation |
age_oldest is highly correlated with age_youngest | High correlation |
risk_factor has 240418 (36.1%) missing values | Missing |
C_previous has 18711 (2.8%) missing values | Missing |
duration_previous has 18711 (2.8%) missing values | Missing |
day has 140539 (21.1%) zeros | Zeros |
duration_previous has 24926 (3.7%) zeros | Zeros |
Reproduction
| Analysis started | 2022-03-22 17:04:46.899273 |
|---|---|
| Analysis finished | 2022-03-22 17:07:21.102110 |
| Duration | 2 minutes and 34.2 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
customer_ID
Real number (ℝ≥0)
| Distinct | 97009 |
|---|---|
| Distinct (%) | 14.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10076553.44 |
|---|---|
| Minimum | 10000000 |
| Maximum | 10152724 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 5.1 MiB |
Quantile statistics
| Minimum | 10000000 |
|---|---|
| 5-th percentile | 10007770 |
| Q1 | 10038523 |
| median | 10076403 |
| Q3 | 10114696 |
| 95-th percentile | 10145255.6 |
| Maximum | 10152724 |
| Range | 152724 |
| Interquartile range (IQR) | 76173 |
Descriptive statistics
| Standard deviation | 44049.77859 |
|---|---|
| Coefficient of variation (CV) | 0.004371512428 |
| Kurtosis | -1.195788963 |
| Mean | 10076553.44 |
| Median Absolute Deviation (MAD) | 38079 |
| Skewness | -0.002009957296 |
| Sum | 6.7034171e+12 |
| Variance | 1940382994 |
| Monotocity | Increasing |
| Value | Count | Frequency (%) | |
| 10129147 | 13 | < 0.1% | |
| 10028761 | 13 | < 0.1% | |
| 10007398 | 13 | < 0.1% | |
| 10107396 | 13 | < 0.1% | |
| 10103851 | 13 | < 0.1% | |
| 10110688 | 13 | < 0.1% | |
| 10089345 | 13 | < 0.1% | |
| 10150752 | 13 | < 0.1% | |
| 10063394 | 13 | < 0.1% | |
| 10075547 | 13 | < 0.1% | |
| Other values (96999) | 665119 | > 99.9% |
| Value | Count | Frequency (%) | |
| 10000000 | 9 | < 0.1% | |
| 10000005 | 6 | < 0.1% | |
| 10000007 | 8 | < 0.1% | |
| 10000013 | 4 | < 0.1% | |
| 10000014 | 6 | < 0.1% |
| Value | Count | Frequency (%) | |
| 10152724 | 6 | < 0.1% | |
| 10152723 | 3 | < 0.1% | |
| 10152721 | 6 | < 0.1% | |
| 10152720 | 8 | < 0.1% | |
| 10152718 | 9 | < 0.1% |
shopping_pt
Real number (ℝ≥0)
| Distinct | 13 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.219965757 |
|---|---|
| Minimum | 1 |
| Maximum | 13 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 5.1 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 4 |
| Q3 | 6 |
| 95-th percentile | 8 |
| Maximum | 13 |
| Range | 12 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 2.394368765 |
|---|---|
| Coefficient of variation (CV) | 0.5673905674 |
| Kurtosis | -0.5640492606 |
| Mean | 4.219965757 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.4697521391 |
| Sum | 2807328 |
| Variance | 5.733001784 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 1 | 97009 | 14.6% | |
| 2 | 97009 | 14.6% | |
| 3 | 97009 | 14.6% | |
| 4 | 91441 | 13.7% | |
| 5 | 83440 | 12.5% | |
| 6 | 72171 | 10.8% | |
| 7 | 56548 | 8.5% | |
| 8 | 37958 | 5.7% | |
| 9 | 20710 | 3.1% | |
| 10 | 8725 | 1.3% | |
| Other values (3) | 3229 | 0.5% |
| Value | Count | Frequency (%) | |
| 1 | 97009 | 14.6% | |
| 2 | 97009 | 14.6% | |
| 3 | 97009 | 14.6% | |
| 4 | 91441 | 13.7% | |
| 5 | 83440 | 12.5% |
| Value | Count | Frequency (%) | |
| 13 | 50 | < 0.1% | |
| 12 | 525 | 0.1% | |
| 11 | 2654 | 0.4% | |
| 10 | 8725 | 1.3% | |
| 9 | 20710 | 3.1% |
record_type
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.1 MiB |
| 0 | |
|---|---|
| 1 |
| Value | Count | Frequency (%) | |
| 0 | 568240 | 85.4% | |
| 1 | 97009 | 14.6% |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.969429492 |
|---|---|
| Minimum | 0 |
| Maximum | 6 |
| Zeros | 140539 |
| Zeros (%) | 21.1% |
| Memory size | 5.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 2 |
| Q3 | 3 |
| 95-th percentile | 4 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.4534705 |
|---|---|
| Coefficient of variation (CV) | 0.7380160122 |
| Kurtosis | -1.168196957 |
| Mean | 1.969429492 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.1306439815 |
| Sum | 1310161 |
| Variance | 2.112576494 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 0 | 140539 | 21.1% | |
| 1 | 136921 | 20.6% | |
| 2 | 133453 | 20.1% | |
| 4 | 123639 | 18.6% | |
| 3 | 121342 | 18.2% | |
| 5 | 8378 | 1.3% | |
| 6 | 977 | 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 140539 | 21.1% | |
| 1 | 136921 | 20.6% | |
| 2 | 133453 | 20.1% | |
| 3 | 121342 | 18.2% | |
| 4 | 123639 | 18.6% |
| Value | Count | Frequency (%) | |
| 6 | 977 | 0.1% | |
| 5 | 8378 | 1.3% | |
| 4 | 123639 | 18.6% | |
| 3 | 121342 | 18.2% | |
| 2 | 133453 | 20.1% |
| Distinct | 1204 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.1 MiB |
| 14:35 | 1553 |
|---|---|
| 14:59 | 1489 |
| 14:57 | 1471 |
| 15:22 | 1454 |
| 15:20 | 1453 |
| Other values (1199) |
| Value | Count | Frequency (%) | |
| 14:35 | 1553 | 0.2% | |
| 14:59 | 1489 | 0.2% | |
| 14:57 | 1471 | 0.2% | |
| 15:22 | 1454 | 0.2% | |
| 15:20 | 1453 | 0.2% | |
| 15:09 | 1452 | 0.2% | |
| 14:40 | 1441 | 0.2% | |
| 14:16 | 1440 | 0.2% | |
| 14:39 | 1435 | 0.2% | |
| 15:13 | 1432 | 0.2% | |
| Other values (1194) | 650629 | 97.8% |
Unique
| Unique | 99 ? |
|---|---|
| Unique (%) | < 0.1% |
Length
| Max length | 5 |
|---|---|
| Median length | 5 |
| Mean length | 5 |
| Min length | 5 |
state
Categorical
| Distinct | 36 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.1 MiB |
| FL | |
|---|---|
| NY | |
| PA | |
| OH | |
| MD | 28443 |
| Other values (31) |
| Value | Count | Frequency (%) | |
| FL | 106287 | 16.0% | |
| NY | 91627 | 13.8% | |
| PA | 60677 | 9.1% | |
| OH | 44537 | 6.7% | |
| MD | 28443 | 4.3% | |
| IN | 25295 | 3.8% | |
| WA | 25188 | 3.8% | |
| CO | 24409 | 3.7% | |
| AL | 23560 | 3.5% | |
| CT | 19353 | 2.9% | |
| Other values (26) | 215873 | 32.4% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 2 |
|---|---|
| Median length | 2 |
| Mean length | 2 |
| Min length | 2 |
location
Real number (ℝ≥0)
| Distinct | 6248 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12271.54302 |
|---|---|
| Minimum | 10001 |
| Maximum | 16580 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 5.1 MiB |
Quantile statistics
| Minimum | 10001 |
|---|---|
| 5-th percentile | 10174 |
| Q1 | 10936 |
| median | 12027 |
| Q3 | 13426 |
| 95-th percentile | 15119 |
| Maximum | 16580 |
| Range | 6579 |
| Interquartile range (IQR) | 2490 |
Descriptive statistics
| Standard deviation | 1564.789415 |
|---|---|
| Coefficient of variation (CV) | 0.1275136641 |
| Kurtosis | -0.7368574075 |
| Mean | 12271.54302 |
| Median Absolute Deviation (MAD) | 1215 |
| Skewness | 0.4780995787 |
| Sum | 8163631724 |
| Variance | 2448565.912 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 10083 | 1094 | 0.2% | |
| 10213 | 977 | 0.1% | |
| 10348 | 974 | 0.1% | |
| 11196 | 869 | 0.1% | |
| 11517 | 864 | 0.1% | |
| 10030 | 798 | 0.1% | |
| 12091 | 731 | 0.1% | |
| 10723 | 703 | 0.1% | |
| 10793 | 701 | 0.1% | |
| 10245 | 682 | 0.1% | |
| Other values (6238) | 656856 | 98.7% |
| Value | Count | Frequency (%) | |
| 10001 | 238 | < 0.1% | |
| 10002 | 92 | < 0.1% | |
| 10003 | 277 | < 0.1% | |
| 10004 | 255 | < 0.1% | |
| 10005 | 296 | < 0.1% |
| Value | Count | Frequency (%) | |
| 16580 | 8 | < 0.1% | |
| 16579 | 10 | < 0.1% | |
| 16578 | 8 | < 0.1% | |
| 16577 | 5 | < 0.1% | |
| 16576 | 8 | < 0.1% |
group_size
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.1 MiB |
| 1 | |
|---|---|
| 2 | |
| 3 | 8856 |
| 4 | 695 |
| Value | Count | Frequency (%) | |
| 1 | 519305 | 78.1% | |
| 2 | 136393 | 20.5% | |
| 3 | 8856 | 1.3% | |
| 4 | 695 | 0.1% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
homeowner
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.1 MiB |
| 1 | |
|---|---|
| 0 |
| Value | Count | Frequency (%) | |
| 1 | 356726 | 53.6% | |
| 0 | 308523 | 46.4% |
car_age
Real number (ℝ≥0)
| Distinct | 67 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8.139436512 |
|---|---|
| Minimum | 0 |
| Maximum | 85 |
| Zeros | 5805 |
| Zeros (%) | 0.9% |
| Memory size | 5.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 7 |
| Q3 | 12 |
| 95-th percentile | 18 |
| Maximum | 85 |
| Range | 85 |
| Interquartile range (IQR) | 9 |
Descriptive statistics
| Standard deviation | 5.764598035 |
|---|---|
| Coefficient of variation (CV) | 0.7082306037 |
| Kurtosis | 4.106639801 |
| Mean | 8.139436512 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 1.223618387 |
| Sum | 5414752 |
| Variance | 33.2305905 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 1 | 71650 | 10.8% | |
| 2 | 50389 | 7.6% | |
| 7 | 46752 | 7.0% | |
| 8 | 44453 | 6.7% | |
| 6 | 44363 | 6.7% | |
| 9 | 44218 | 6.6% | |
| 4 | 40991 | 6.2% | |
| 3 | 40391 | 6.1% | |
| 10 | 39301 | 5.9% | |
| 11 | 36173 | 5.4% | |
| Other values (57) | 206568 | 31.1% |
| Value | Count | Frequency (%) | |
| 0 | 5805 | 0.9% | |
| 1 | 71650 | 10.8% | |
| 2 | 50389 | 7.6% | |
| 3 | 40391 | 6.1% | |
| 4 | 40991 | 6.2% |
| Value | Count | Frequency (%) | |
| 85 | 4 | < 0.1% | |
| 75 | 6 | < 0.1% | |
| 74 | 18 | < 0.1% | |
| 65 | 5 | < 0.1% | |
| 64 | 6 | < 0.1% |
car_value
Categorical
| Distinct | 9 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1531 |
| Missing (%) | 0.2% |
| Memory size | 5.1 MiB |
| e | |
|---|---|
| f | |
| d | |
| g | |
| h | |
| Other values (4) |
| Value | Count | Frequency (%) | |
| e | 219251 | 33.0% | |
| f | 177204 | 26.6% | |
| d | 113174 | 17.0% | |
| g | 98152 | 14.8% | |
| h | 28976 | 4.4% | |
| c | 20820 | 3.1% | |
| i | 3603 | 0.5% | |
| b | 1402 | 0.2% | |
| a | 1136 | 0.2% | |
| (Missing) | 1531 | 0.2% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 3 |
|---|---|
| Median length | 1 |
| Mean length | 1.004602788 |
| Min length | 1 |
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 240418 |
| Missing (%) | 36.1% |
| Memory size | 5.1 MiB |
| 3 | |
|---|---|
| 4 | |
| 1 | |
| 2 |
| Value | Count | Frequency (%) | |
| 3 | 117571 | 17.7% | |
| 4 | 110754 | 16.6% | |
| 1 | 99476 | 15.0% | |
| 2 | 97030 | 14.6% | |
| (Missing) | 240418 | 36.1% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
| Distinct | 58 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 44.99240284 |
|---|---|
| Minimum | 18 |
| Maximum | 75 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 5.1 MiB |
Quantile statistics
| Minimum | 18 |
|---|---|
| 5-th percentile | 22 |
| Q1 | 28 |
| median | 44 |
| Q3 | 60 |
| 95-th percentile | 75 |
| Maximum | 75 |
| Range | 57 |
| Interquartile range (IQR) | 32 |
Descriptive statistics
| Standard deviation | 17.40343996 |
|---|---|
| Coefficient of variation (CV) | 0.3868084134 |
| Kurtosis | -1.228547547 |
| Mean | 44.99240284 |
| Median Absolute Deviation (MAD) | 16 |
| Skewness | 0.2257477638 |
| Sum | 29931151 |
| Variance | 302.8797225 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 75 | 45211 | 6.8% | |
| 24 | 21627 | 3.3% | |
| 23 | 21489 | 3.2% | |
| 25 | 21278 | 3.2% | |
| 22 | 19692 | 3.0% | |
| 26 | 18679 | 2.8% | |
| 27 | 16046 | 2.4% | |
| 28 | 15180 | 2.3% | |
| 21 | 14255 | 2.1% | |
| 29 | 13179 | 2.0% | |
| Other values (48) | 458613 | 68.9% |
| Value | Count | Frequency (%) | |
| 18 | 1666 | 0.3% | |
| 19 | 6592 | 1.0% | |
| 20 | 10180 | 1.5% | |
| 21 | 14255 | 2.1% | |
| 22 | 19692 | 3.0% |
| Value | Count | Frequency (%) | |
| 75 | 45211 | 6.8% | |
| 74 | 5646 | 0.8% | |
| 73 | 5861 | 0.9% | |
| 72 | 6194 | 0.9% | |
| 71 | 6825 | 1.0% |
| Distinct | 60 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 42.57758824 |
|---|---|
| Minimum | 16 |
| Maximum | 75 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 5.1 MiB |
Quantile statistics
| Minimum | 16 |
|---|---|
| 5-th percentile | 20 |
| Q1 | 26 |
| median | 40 |
| Q3 | 57 |
| 95-th percentile | 75 |
| Maximum | 75 |
| Range | 59 |
| Interquartile range (IQR) | 31 |
Descriptive statistics
| Standard deviation | 17.46043237 |
|---|---|
| Coefficient of variation (CV) | 0.4100850492 |
| Kurtosis | -1.146121776 |
| Mean | 42.57758824 |
| Median Absolute Deviation (MAD) | 15 |
| Skewness | 0.3624556029 |
| Sum | 28324698 |
| Variance | 304.8666986 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 75 | 34262 | 5.2% | |
| 23 | 24208 | 3.6% | |
| 24 | 23635 | 3.6% | |
| 25 | 23509 | 3.5% | |
| 22 | 22290 | 3.4% | |
| 26 | 20510 | 3.1% | |
| 27 | 17718 | 2.7% | |
| 21 | 17311 | 2.6% | |
| 28 | 16832 | 2.5% | |
| 29 | 14437 | 2.2% | |
| Other values (50) | 450537 | 67.7% |
| Value | Count | Frequency (%) | |
| 16 | 4478 | 0.7% | |
| 17 | 2967 | 0.4% | |
| 18 | 5093 | 0.8% | |
| 19 | 10182 | 1.5% | |
| 20 | 13480 | 2.0% |
| Value | Count | Frequency (%) | |
| 75 | 34262 | 5.2% | |
| 74 | 5048 | 0.8% | |
| 73 | 5152 | 0.8% | |
| 72 | 5234 | 0.8% | |
| 71 | 6102 | 0.9% |
married_couple
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.1 MiB |
| 0 | |
|---|---|
| 1 |
| Value | Count | Frequency (%) | |
| 0 | 525692 | 79.0% | |
| 1 | 139557 | 21.0% |
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 18711 |
| Missing (%) | 2.8% |
| Memory size | 5.1 MiB |
| 3 | |
|---|---|
| 1 | |
| 2 | |
| 4 |
| Value | Count | Frequency (%) | |
| 3 | 271160 | 40.8% | |
| 1 | 172007 | 25.9% | |
| 2 | 109184 | 16.4% | |
| 4 | 94187 | 14.2% | |
| (Missing) | 18711 | 2.8% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
| Distinct | 16 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 18711 |
| Missing (%) | 2.8% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.003773947 |
|---|---|
| Minimum | 0 |
| Maximum | 15 |
| Zeros | 24926 |
| Zeros (%) | 3.7% |
| Memory size | 5.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 5 |
| Q3 | 9 |
| 95-th percentile | 15 |
| Maximum | 15 |
| Range | 15 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 4.680792915 |
|---|---|
| Coefficient of variation (CV) | 0.7796417648 |
| Kurtosis | -0.6492855709 |
| Mean | 6.003773947 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 0.7522515321 |
| Sum | 3881668 |
| Variance | 21.90982232 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 1 | 81570 | 12.3% | |
| 15 | 79849 | 12.0% | |
| 2 | 79595 | 12.0% | |
| 3 | 70800 | 10.6% | |
| 4 | 57485 | 8.6% | |
| 5 | 49372 | 7.4% | |
| 6 | 45379 | 6.8% | |
| 7 | 37768 | 5.7% | |
| 8 | 30752 | 4.6% | |
| 9 | 26244 | 3.9% | |
| Other values (6) | 87724 | 13.2% |
| Value | Count | Frequency (%) | |
| 0 | 24926 | 3.7% | |
| 1 | 81570 | 12.3% | |
| 2 | 79595 | 12.0% | |
| 3 | 70800 | 10.6% | |
| 4 | 57485 | 8.6% |
| Value | Count | Frequency (%) | |
| 15 | 79849 | 12.0% | |
| 14 | 9739 | 1.5% | |
| 13 | 10963 | 1.6% | |
| 12 | 11284 | 1.7% | |
| 11 | 12718 | 1.9% |
A
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.1 MiB |
| 1 | |
|---|---|
| 0 | |
| 2 |
| Value | Count | Frequency (%) | |
| 1 | 426067 | 64.0% | |
| 0 | 143691 | 21.6% | |
| 2 | 95491 | 14.4% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
B
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.1 MiB |
| 0 | |
|---|---|
| 1 |
| Value | Count | Frequency (%) | |
| 0 | 363069 | 54.6% | |
| 1 | 302180 | 45.4% |
C
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.1 MiB |
| 3 | |
|---|---|
| 1 | |
| 2 | |
| 4 |
| Value | Count | Frequency (%) | |
| 3 | 271607 | 40.8% | |
| 1 | 202945 | 30.5% | |
| 2 | 133468 | 20.1% | |
| 4 | 57229 | 8.6% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
D
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.1 MiB |
| 3 | |
|---|---|
| 2 | |
| 1 |
| Value | Count | Frequency (%) | |
| 3 | 408839 | 61.5% | |
| 2 | 149793 | 22.5% | |
| 1 | 106617 | 16.0% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
E
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.1 MiB |
| 0 | |
|---|---|
| 1 |
| Value | Count | Frequency (%) | |
| 0 | 369085 | 55.5% | |
| 1 | 296164 | 44.5% |
F
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.1 MiB |
| 2 | |
|---|---|
| 0 | |
| 1 | |
| 3 |
| Value | Count | Frequency (%) | |
| 2 | 255806 | 38.5% | |
| 0 | 216395 | 32.5% | |
| 1 | 158613 | 23.8% | |
| 3 | 34435 | 5.2% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
G
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.1 MiB |
| 2 | |
|---|---|
| 3 | |
| 1 | |
| 4 |
| Value | Count | Frequency (%) | |
| 2 | 265237 | 39.9% | |
| 3 | 191163 | 28.7% | |
| 1 | 141946 | 21.3% | |
| 4 | 66903 | 10.1% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
cost
Real number (ℝ≥0)
| Distinct | 531 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 635.7850083 |
|---|---|
| Minimum | 260 |
| Maximum | 922 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 5.1 MiB |
Quantile statistics
| Minimum | 260 |
|---|---|
| 5-th percentile | 564 |
| Q1 | 605 |
| median | 635 |
| Q3 | 665 |
| 95-th percentile | 712 |
| Maximum | 922 |
| Range | 662 |
| Interquartile range (IQR) | 60 |
Descriptive statistics
| Standard deviation | 45.99375791 |
|---|---|
| Coefficient of variation (CV) | 0.0723416836 |
| Kurtosis | 0.9558013252 |
| Mean | 635.7850083 |
| Median Absolute Deviation (MAD) | 30 |
| Skewness | 0.09154557472 |
| Sum | 422955341 |
| Variance | 2115.425766 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 633 | 6024 | 0.9% | |
| 637 | 5985 | 0.9% | |
| 640 | 5935 | 0.9% | |
| 626 | 5934 | 0.9% | |
| 638 | 5916 | 0.9% | |
| 639 | 5912 | 0.9% | |
| 629 | 5896 | 0.9% | |
| 642 | 5894 | 0.9% | |
| 635 | 5875 | 0.9% | |
| 623 | 5854 | 0.9% | |
| Other values (521) | 606024 | 91.1% |
| Value | Count | Frequency (%) | |
| 260 | 1 | < 0.1% | |
| 263 | 4 | < 0.1% | |
| 264 | 1 | < 0.1% | |
| 272 | 2 | < 0.1% | |
| 274 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 922 | 1 | < 0.1% | |
| 917 | 1 | < 0.1% | |
| 912 | 3 | < 0.1% | |
| 911 | 1 | < 0.1% | |
| 900 | 4 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| customer_ID | shopping_pt | record_type | day | time | state | location | group_size | homeowner | car_age | car_value | risk_factor | age_oldest | age_youngest | married_couple | C_previous | duration_previous | A | B | C | D | E | F | G | cost | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10000000 | 1 | 0 | 0 | 08:35 | IN | 10001 | 2 | 0 | 2 | g | 3.0 | 46 | 42 | 1 | 1.0 | 2.0 | 1 | 0 | 2 | 2 | 1 | 2 | 2 | 633 |
| 1 | 10000000 | 2 | 0 | 0 | 08:38 | IN | 10001 | 2 | 0 | 2 | g | 3.0 | 46 | 42 | 1 | 1.0 | 2.0 | 1 | 0 | 2 | 2 | 1 | 2 | 1 | 630 |
| 2 | 10000000 | 3 | 0 | 0 | 08:38 | IN | 10001 | 2 | 0 | 2 | g | 3.0 | 46 | 42 | 1 | 1.0 | 2.0 | 1 | 0 | 2 | 2 | 1 | 2 | 1 | 630 |
| 3 | 10000000 | 4 | 0 | 0 | 08:39 | IN | 10001 | 2 | 0 | 2 | g | 3.0 | 46 | 42 | 1 | 1.0 | 2.0 | 1 | 0 | 2 | 2 | 1 | 2 | 1 | 630 |
| 4 | 10000000 | 5 | 0 | 0 | 11:55 | IN | 10001 | 2 | 0 | 2 | g | 3.0 | 46 | 42 | 1 | 1.0 | 2.0 | 1 | 0 | 2 | 2 | 1 | 2 | 1 | 630 |
| 5 | 10000000 | 6 | 0 | 0 | 11:57 | IN | 10001 | 2 | 0 | 2 | g | 3.0 | 46 | 42 | 1 | 1.0 | 2.0 | 1 | 0 | 2 | 2 | 1 | 2 | 1 | 638 |
| 6 | 10000000 | 7 | 0 | 0 | 11:58 | IN | 10001 | 2 | 0 | 2 | g | 3.0 | 46 | 42 | 1 | 1.0 | 2.0 | 1 | 0 | 2 | 2 | 1 | 2 | 1 | 638 |
| 7 | 10000000 | 8 | 0 | 0 | 12:03 | IN | 10001 | 2 | 0 | 2 | g | 3.0 | 46 | 42 | 1 | 1.0 | 2.0 | 1 | 0 | 2 | 2 | 1 | 2 | 1 | 638 |
| 8 | 10000000 | 9 | 1 | 0 | 12:07 | IN | 10001 | 2 | 0 | 2 | g | 3.0 | 46 | 42 | 1 | 1.0 | 2.0 | 1 | 0 | 2 | 2 | 1 | 2 | 1 | 634 |
| 9 | 10000005 | 1 | 0 | 3 | 08:56 | NY | 10006 | 1 | 0 | 10 | e | 4.0 | 28 | 28 | 0 | 3.0 | 13.0 | 1 | 1 | 3 | 3 | 1 | 0 | 2 | 755 |
Last rows
| customer_ID | shopping_pt | record_type | day | time | state | location | group_size | homeowner | car_age | car_value | risk_factor | age_oldest | age_youngest | married_couple | C_previous | duration_previous | A | B | C | D | E | F | G | cost | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 665239 | 10152721 | 6 | 1 | 4 | 10:14 | CT | 11888 | 1 | 0 | 8 | e | NaN | 23 | 23 | 0 | 4.0 | 5.0 | 1 | 0 | 3 | 3 | 1 | 0 | 2 | 716 |
| 665240 | 10152723 | 1 | 0 | 2 | 15:13 | FL | 10711 | 1 | 1 | 0 | f | 2.0 | 39 | 39 | 0 | 3.0 | 2.0 | 0 | 0 | 2 | 1 | 0 | 0 | 4 | 656 |
| 665241 | 10152723 | 2 | 0 | 2 | 15:14 | FL | 10711 | 1 | 1 | 0 | f | 2.0 | 39 | 39 | 0 | 3.0 | 2.0 | 1 | 0 | 3 | 3 | 0 | 2 | 3 | 687 |
| 665242 | 10152723 | 3 | 1 | 1 | 10:30 | FL | 10711 | 1 | 1 | 0 | g | 2.0 | 39 | 39 | 0 | 3.0 | 7.0 | 1 | 0 | 3 | 3 | 1 | 2 | 3 | 651 |
| 665243 | 10152724 | 1 | 0 | 3 | 13:42 | KY | 10204 | 1 | 1 | 1 | e | NaN | 20 | 20 | 0 | 1.0 | 4.0 | 0 | 0 | 3 | 3 | 0 | 0 | 2 | 642 |
| 665244 | 10152724 | 2 | 0 | 3 | 13:43 | KY | 10204 | 1 | 1 | 1 | e | NaN | 20 | 20 | 0 | 1.0 | 4.0 | 1 | 0 | 2 | 3 | 0 | 2 | 2 | 677 |
| 665245 | 10152724 | 3 | 0 | 3 | 13:43 | KY | 10204 | 1 | 1 | 1 | e | NaN | 20 | 20 | 0 | 1.0 | 4.0 | 1 | 0 | 2 | 3 | 0 | 2 | 2 | 677 |
| 665246 | 10152724 | 4 | 0 | 3 | 13:44 | KY | 10204 | 1 | 1 | 1 | e | NaN | 20 | 20 | 0 | 1.0 | 4.0 | 1 | 0 | 2 | 3 | 0 | 2 | 2 | 677 |
| 665247 | 10152724 | 5 | 0 | 3 | 13:46 | KY | 10204 | 1 | 1 | 1 | e | NaN | 20 | 20 | 0 | 1.0 | 4.0 | 1 | 0 | 2 | 3 | 0 | 2 | 2 | 685 |
| 665248 | 10152724 | 6 | 1 | 1 | 15:14 | KY | 10204 | 1 | 1 | 1 | d | NaN | 20 | 20 | 0 | 4.0 | 4.0 | 1 | 0 | 3 | 3 | 0 | 2 | 2 | 681 |